Boosting on a Budget: Sampling for Feature-Efficient Prediction

نویسنده

  • Lev Reyzin
چکیده

In this paper, we tackle the problem of feature-efficient prediction: classification using a limited number of features per test example. We show that modifying an ensemble classifier such as AdaBoost, by sampling hypotheses from its final weighted predictor, is well-suited for this task. We further consider an extension of this problem, where the costs of examining the various features can differ from one another, and we give an algorithm for this more general setting. We prove the correctness of our algorithms and derive bounds for the number of samples needed for given error rates. We also experimentally verify the effectiveness of our methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training-Time Optimization of a Budgeted Booster

We consider the problem of feature-efficient prediction – a setting where features have costs and the learner is limited by a budget constraint on the total cost of the features it can examine in test time. We focus on solving this problem with boosting by optimizing the choice of base learners in the training phase and stopping the boosting process when the learner’s budget runs out. We experi...

متن کامل

Cold Start Purchase Prediction with Budgets Constraints

IJCAI-16 Contest Brick-and-Mortar Store Recommendation with Budget Constraints is about buyer nearby brick-and-mortar stores recommendation. The main task of this competition focuses on predicting nearby store buying action when users enter new areas they rarely visited in the past. The contest has two novelties: first, given huge amount of online user behavior with on-site shopping record of m...

متن کامل

Comparing Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation

High dimensionality and class imbalance are two main problems that affect the quality of training datasets in software defect prediction, resulting in inefficient classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original data set. Data sampling alters the data s...

متن کامل

Amazon Employee Access Control System

In this work, based on the history data of 20102011 from Amazon Inc., we build up a system which aims to take place of resource administrators at Amazon. Our analysis shows that the given dataset is highly imbalanced with categorical values. Thus in the preprocessing step, we tried different sampling methods, feature selection as well as one hot encoding to make the data more suitable for predi...

متن کامل

Enhanced Cost Sensitive Boosting Network for Software Defect Prediction

plays an important role in reducing the costs of software development and maintaining the high quality of software systems. The early prediction of defectproneness of the modules can allow software developers to allocate the limited resources on those defect-prone modules such that high quality software can be produced on time and within budget. It is a great challenge to address the class-imba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011